Method and device for error correction model training and text error correction

ABSTRACT

A computer-implemented method is performed at a device having one or more processors and memory storing programs executed by the one or more processors. The method comprises: selecting a target word in a target sentence; from the target sentence, acquiring a first sequence of words that precede the target word and a second sequence of words that succeed the target word; from a sentence database, searching and acquiring a group of words, each of which separates the first sequence of words from the second sequence of words in a sentence; creating a candidate sentence for each of the candidate words by replacing the target word in the target sentence with each of the candidate words; determining the fittest sentence among the candidate sentences according to a linguistic model; and suggesting the candidate word within the fittest sentence as a correction.

RELATED APPLICATIONS

This application is a continuation application of PCT Patent ApplicationNo. PCT/CN2013/086152, entitled “Method and Device for Error CorrectionModel Training and Text Error Correction” filed on Oct. 29, 2013, whichclaims priority to Chinese Patent Application No. 201310033697.8,“Method and Device for Error Correction Model Training and Text ErrorCorrection”, filed on Jan. 29, 2013, both of which are herebyincorporated by reference in their entirety.

FIELD OF THE INVENTION

The present application relates to the technical field of informationprocessing, especially relates to a method and device for errorcorrection model training and text error correction.

BACKGROUND OF THE INVENTION

There are often error character strings, such as wrongly written ormispronounced characters and mis-spelled words, in the text used indaily work and life. How to recognize and correct the error characterstrings in the text by a computer is a technical problem to be solved inthe current technical field of information processing.

At present, there exist text correction programs based on languagerules.

Specifically, in the programs, the language rules such as wordcollocation rules and word spelling rules of target language (i.e. thelanguage adopted by target document) are summarized preliminarily. Forexample, when the target language is Chinese, the word collocation rulesof Chinese will be summarized preliminarily, then according to thepreliminarily summarized language rules to evaluate the text to beprocessed and judge whether the text to be processed conforms to thepreliminarily summarized language rules. When the evaluating resultshows that the conformity of text to be processed with the preliminarilysummarized language rules does not meet the predetermined requirements,the program conducts error correction processing for the text to beprocessed according to the preliminarily summarized language rules.

It can be seen that the conventional text error correction program basedon language rules not only needs a lot of working personnel withabundant language background to summarize a mass of language rules. Butdue to the complex structure of language itself, it is not easy tosummarize language rules, and there are often conflicts betweendifferent summarized language rules. Therefore, the error recall rate oftext error correction program based on language rules is low and theaccuracy of error correction is also low.

SUMMARY

The present application provides a text-processing method and apparatusbased on context information of a word in a sentence to improve upon theaccuracy and comprehensiveness of existing text-processing methods.

In accordance with some embodiments of the present application, acomputer-implemented method is performed at a device having one or moreprocessors and memory storing programs executed by the one or moreprocessors. The method comprises: selecting a target word in a targetsentence by first predefined criteria; from the target sentence,acquiring a first sequence of words that precede the target word and asecond sequence of words that succeed the target word; from a sentencedatabase, searching and acquiring a group of words, each of whichseparates the first sequence of words from the second sequence of wordsin a sentence; from the group of words, selecting candidate words whosesimilarity to the target word is above a pre-set threshold according tosecond predefined criteria; creating a candidate sentence for each ofthe candidate words by replacing the target word in the target sentencewith each of the candidate words; determining the fittest sentence amongthe candidate sentences according to a linguistic model; and suggestingthe candidate word within the fittest sentence as a correction.

In accordance with some embodiments of the present application, atext-processing device comprises one or more processors: memory; and oneor more programs stored in the memory and to be executed by theprocessor. The one or more programs include instructions for: selectinga target word in a target sentence by first predefined criteria; fromthe target sentence, acquiring a first sequence of words that precedethe target word and a second sequence of words that succeed the targetword; from a sentence database, searching and acquiring a group ofwords, each of which separates the first sequence of words from thesecond sequence of words in a sentence; from the group of words,selecting candidate words whose similarity to the target word is above apre-set threshold according to second predefined criteria; creating acandidate sentence for each of the candidate words by replacing thetarget word in the target sentence with each of the candidate words;determining the fittest sentence among the candidate sentences accordingto a linguistic model; and suggesting the candidate word within thefittest sentence as a correction.

BRIEF DESCRIPTION OF THE DRAWINGS

The aforementioned features and advantages of the invention as well asadditional features and advantages thereof will be more clearlyunderstood hereinafter as a result of a detailed description ofpreferred embodiments when taken in conjunction with the drawings.

FIG. 1 is a flowchart of a method of training error correction model inaccordance with some embodiments;

FIG. 2 is a flowchart of a method of training error correction model inaccordance with some embodiments;

FIG. 3 is a schematic structural diagram of a text-processing device inaccordance with some embodiments;

FIG. 4 is a flowchart of a text-processing method in accordance withsome embodiments;

FIG. 5 is a schematic structural diagram of a text-processing device inaccordance with some embodiments;

FIG. 6 is a flowchart of a text-processing method in accordance withsome embodiments;

FIG. 7 is a schematic structural diagram of a text-processing device inaccordance with some embodiments;

FIG. 8 is a flowchart of a text-processing method in accordance withsome embodiments;

FIG. 9 is a schematic structural diagram of a text-processing device inaccordance with some embodiments.

Like reference numerals refer to corresponding parts throughout theseveral views of the drawings.

DESCRIPTION OF EMBODIMENTS

Reference will now be made in detail to embodiments, examples of whichare illustrated in the accompanying drawings. In the following detaileddescription, numerous specific details are set forth in order to providea thorough understanding of the subject matter presented herein. But itwill be apparent to one skilled in the art that the subject matter maybe practiced without these specific details. In other instances,well-known methods, procedures, components, and circuits have not beendescribed in detail so as not to unnecessarily obscure aspects of theembodiments.

In accordance with some embodiments of the present application, atext-processing program conducts the error correction processingaccording to the context information of a character string.Specifically, the program recognizes the error character stringsappearing in some contexts by the similarity analysis of correctcharacter strings and character strings to be processed with the samecontext information, and replaces the error character strings appearingin some contexts with corresponding correct charter strings. Forrecognition and correction purpose, a character string is usually a wordconsisting of one or more characters.

In accordance with some embodiments, the error correction model can beestablished in advance according to the context information of characterstrings and similarity among character strings, during the practicalerror correction process of text to be processed, conduct the errorcorrection processing according to the error correction rules of theerror correction model. It can also recognize error character string andreplace the error character string with corresponding correct characterstring based on context information of a character string and similarityamong character strings during the practical error correction process oftext to be processed.

FIG. 1 is a flowchart of a method of training error correction model inaccordance with some embodiments.

As shown in FIG. 1, this first flowchart diagram includes:

Step 101, searching for context information of a correct characterstring in training text collection; taking the mentioned contextinformation as effective context information; and storing all of correctcharacter strings corresponding to each effective context information.

Step 102, searching for the character strings to be processed whosesimilarity to correct character strings meeting the predeterminedrequirement and having the mentioned effective context information inthe training text collection.

Step 103, generating error correction rules according to the characterstrings to be processed, the correct character strings whose similarityto character strings to be processed meet the predetermined requirementsand shared effective context information of character strings to beprocessed and correct character strings, and establishing errorcorrection model according to the test result of error correction rules.

Among which, the mentioned training text collection can include thefirst text collection, the second text collection and the third textcollection, the training method shown in FIG. 1 can also be furtherspecified, and refer to the flowchart shown in FIG. 2 for moreinformation.

FIG. 2 is a flowchart of a method of training error correction model inaccordance with some embodiments.

As is shown in FIG. 2, the method includes:

Step 201, according to predetermined rules, searching for contextinformation of a preset correct character string in the first textcollection.

In accordance with some embodiments, a text-processing program generallytakes the words in preset dictionary as correct character strings. Yetother methods to determine correct character strings are acceptable aswell. Words can include words or phrases formed by multiple characters,or a single character.

Step 202, taking the mentioned context information as effective contextinformation, storing all of correct character strings corresponding toeach effective context information.

In this step, all of the effective context information corresponding toeach correct character string can also be stored for convenience ofsearching for all of the effective context information corresponding tospecified correct character string.

Step 203, searching for the character string to be processed from thesecond text collection.

In this step, to limit the scope of the character string to be processedso as to accelerate the establishment of error correction model, thetext-processing program can search for the character string to beprocessed in the mentioned length scope from the training textcollection according to the length scope of words in the mentionedpredetermined dictionary.

Step 204, determining whether context information of the characterstring to be processed in the second text collection includes effectivecontext information.

In this step, according to the mentioned predetermined rules, thetext-processing program searches for the context information of thecharacter string to be processed from the training text collection, andjudges whether the context information of the character string to beprocessed is the mentioned effective context information according tothe matching effect between context of the character string to beprocessed and effective context.

The character matching algorithm can be adopted to match the context ofthe character string to be processed with effective context directly, ormatch after transferring the context of the character string to beprocessed and effective context into other equivalent information.

Step 205, when context information of the character string to beprocessed in the second text collection includes effective contextinformation, text-processing program judges whether similarity betweenthe character string to be processed and the correct character stringcorresponding to this effective context information meets predeterminedrequirements.

When judging whether the similarity between the character string to beprocessed and the correct character string with the same effectivecontext information meets predetermined requirements, thetext-processing program judges according to the pronunciation of thecharacter string to be processed and correct character string, or judgeaccording to the character pattern of the character string to beprocessed and correct character string. For example, if thepronunciation or character pattern is similar, then the character stringto be processed and the correct character string are determined to besimilar strings with each other.

Specifically, for the character string to be processed and the correctcharacter string with the same effective context information, thetext-processing program judges whether the similarity of thepronunciation of the character string to be processed and pronunciationof mentioned correct string meets predetermined requirements accordingto the pronunciation dictionary. If the pronunciations are similar, thecharacter string to be processed and the correct character string aredetermined to be similar strings with each other.

Alternatively, for the character string to be processed and the correctcharacter string with the same effective context information, judgewhether the similarity of the character pattern of the character stringto be processed and character pattern of mentioned correct string meetspredetermined requirements. If yes, the character string to be processedand the correct character string are determined to be similar stringswith each other.

Step 206, based on the character string to be processed and the correctcharacter string with mutual similarity meeting predeterminedrequirements, as well as the shared effective context information by thecharacter string to be processed and the correct character string,text-processing program generates error correction rules to be tested.

For each pair of the character string to be processed and the correctcharacter string with the same effective context information and whosemutual similarity meets predetermined requirements, the error correctionrules to be tested include: the first error correction rules used forreplacing the character string to be processed with the correctcharacter string whose mutual similarity meets predeterminedrequirements, and/or, the second error correction rules used forreplacing the character string to be processed and its effective contextinformation with the correct character string and the effective contextinformation whose similarity with the character string to be processedmeet predetermined requirements (i.e., the similarity is above a pre-setthreshold) and has the effective context information.

In another word, each pair of the character string to be processed andthe correct character string with the same effective context informationand whose mutual similarity meets predetermined requirements has onefirst error correction rule and more than one second error correctionrule. When the character string to be processed and the correctcharacter string have more than two pieces of same effective contextinformation, the character string to be processed and the correctcharacter string and each of the shared effective context informationcan combine into different second error correction rules.

For example, a correct character string B has effective context C and Din the first text collection. A character string A to be processed alsohas effective context C and D in the second text collection. And thesimilarity of the character string A to be processed and the correctcharacter string B meets predetermined requirements. Then the errorcorrection rules corresponding to the character string A to be processedand the correct character string B include: replacing the characterstring A to be processed with the correct character string B; replacingthe character string A to be processed and its context C with correctcharacter string B and its context C; and replacing the character stringA to be processed and its context D with the correct character string Band its context D.

Step 207, conducting error correction processing for the third textcollection by using the error correction rules to be tested,establishing error correction model based on assessment information ofprocessing result of error correction. The error correction model shouldinclude error correction rules by which assessment information of itsprocessing result of error correction meets predetermined conditions.

In this step, for each pair of the character string to be processed andthe correct character string with the same effective context informationand whose mutual similarity meets predetermined requirements searchedout in Step 205, according to the first error correction rules, thetext-processing program replaces the character string to be processed inthe training text collection with the correct character string to obtainthe first replacing result and judges whether the assessment result ofthe first replacing result meets predetermined conditions. If thepredetermined conditions are met, the first error correction rules passthe assessment. If not, the first error correction rules are dropped.

Based on the second error correction rules, the text-processing programreplaces the character string to be processed in the third textcollection and its effective context information with the correctcharacter string and effective context information. Then thetext-processing program judges whether the assessment result of thesecond replacing result meets predetermined conditions. If thepredetermined conditions are met, the second error correction rules arepassed. If not, the second error correction rules are dropped. The errorcorrection model includes the mentioned passed error correction rules.The established error correction model includes the mentioned passederror correction rules.

For each pair of the character string to be processed and the correctcharacter string with the same effective context information and whosemutual similarity meets predetermined requirements searched out in Step205. If the first error correction rules corresponding to the characterstring to be processed and the correct character string can pass theassessment, then it is generally unnecessary to assess other errorcorrection rules corresponding to the character string to be processedand correct character string.

The present application does not limit the specific method of assessmentfor the replaced results. For example, the replaced results can beassessed according to language rules, pre-established language model.The replaced results can also be assessed manually.

In the present application, the context information of the characterstring includes the text in front of the character string (contextinformation in front of string for short) and the text after thecharacter string (context information after string for short).

For any target character string (for example, this target characterstring is a certain correct character string, or a certain characterstring to be processed), there are many methods to determine the contextinformation of this target character string. For example: the characterstring with predetermined length in front of and/or after the targetcharacter string can be determined as the context information of thementioned target character string; or, according to the severalpredetermined words emerged before and/or after dictionary searching fortarget character string, the mentioned several predetermined words aredetermined as the context information of the mentioned target characterstring; or, according to the semantic features of the target characterstring, select context information for the mentioned target characterstring based on the predetermined language rules. The mentioned allkinds of methods to determine the context information of the targetcharacter string can be used separately, or in combination with eachothers.

For the text collection used in the method shown in FIG. 2, the firsttext collection, the second text collection and the mentioned third textcollection can be the same one, among which include certain proportionalerror character strings, but the most are correct character strings.

Alternatively, the first text collection can be the text collectiondifferent with the second text collection and the third text collection.The accuracy of the text in the first text collection is higher than theaccuracy of the text in the second text collection and third textcollection. The second text collection and the mentioned third textcollection can be the same text collection or different textcollections. The more abundant and the broader the anticipated resourceof the text collections used in the method shown in FIG. 2 is, thebetter the error correction effect of the established error correctionmodel are.

FIG. 3 is a schematic structural diagram of a text-processing device inaccordance with some embodiments.

As shown in FIG. 3, this device includes effective context collectionmodule 301, similar string search module 302 and model establishmentmodule 303.

Effective context collection module 301 is configured to search thecontext information of a correct character string in the training textcollection, and use the mentioned context information as the effectivecontext information to store all correct character strings correspondingto each effective context information.

Similar string search module 302 is configured to search the characterstrings to be processed in the training text collection. The similaritybetween the character strings to be processed and the correct characterstrings must satisfy the predetermined requirements and have theeffective context information.

Model establishment module 303 is configured to generate errorcorrection rules according to the character strings to be processed, thecorrect character strings whose similarity to character strings to beprocessed meet the predetermined requirements and shared effectivecontext information of character strings to be processed and correctcharacter strings, and establish error correction model according to thetest result of error correction rules.

Effective context collection module 301 is configured to search thecontext information of the preset correct character strings in the firsttext collection based on the predetermined rules, and use the mentionedcontext information as the effective context information to store allcorrect character strings corresponding to each effective contextinformation.

Similar string search module 302 is configured to search the characterstrings to be processed from the second text collection, and determinewhether the context information of the character strings to be processedin the second text collection include effective context information.Also, string search module 30 is configured to judge whether thesimilarity of the character strings to be processed and the correctcharacter strings corresponding to the effective context informationsatisfies the predetermined requirements or not.

Model establishment module 303 is also configured to generate errorcorrection rules to be tested based on the common effective contextinformation of character strings to be processed and correct characterstrings, the character strings to be processed and the correct characterstrings that the similarities among them have satisfied thepredetermined requirements, and use the error correction rules to betested to conduct error correction processing for the third textcollection, to establish error correction model based on the assessmentinformation for error correction processing results, the errorcorrection model includes the error correction rules that the assessmentinformation of its error correction processing results satisfies thepredetermined conditions.

The error correction rules to be tested include: the first errorcorrection rules used for replacing the character string to be processedwith the correct character string whose mutual similarity meetspredetermined requirements, and/or, the second error correction rulesused for replacing the character string to be processed and itseffective context information with the correct character string andmentioned effective context information whose similarity with thecharacter string to be processed meet predetermined requirements and hasthe mentioned effective context information.

The mentioned preset correct character strings can include the words inthe preset dictionary.

Similar string search module 302 is configured to search the characterstrings to be processed within the scope of the mentioned length fromthe training text collection based on the length scope of the words inthe mentioned predetermined dictionary.

Similar string search module 302 is configured to search for the contextinformation of the character string to be processed from the trainingtext collection according to the mentioned predetermined rules, andjudge whether the context information of the character string to beprocessed is the mentioned effective context information according tothe matching effect between context of the character string to beprocessed and effective context.

The mentioned context information includes the context information infront of string and/or the context information after string.

The mentioned predetermined rules for searching context informationinclude: the character strings with predetermined length in front ofand/or after the target character string are determined as the contextinformation of the mentioned target character string; or, searching theseveral predetermined words emerged before and/or after the targetcharacter string according to dictionary, the mentioned severalpredefined words are determined as the context information of thementioned target character string; or, according to the semanticfeatures of the target character string, select context information forthe mentioned target character string based on the predeterminedlanguage rules.

Similar string search module 302 is configured to judge whether thesimilarity between the pronunciation of the character string to beprocessed and the pronunciation of the correct character string meetspredetermined requirements according to pronunciation dictionary. Inaddition, similar string search module 302 is configured to judgewhether the similarity between the glyph of the character string to beprocessed and the glyph of the correct character string meetspredetermined requirements according to glyph dictionary.

Model establishment module 303 is configured, according to the characterstrings to be processed and the correct character strings that thesimilarities among them have satisfied the predetermined requirements,to replace the character string to be processed in the training textcollection with the correct character string to obtain the firstreplacing result according to the first error correction rules, judgewhether the assessment result of the first replacing result meetspredetermined conditions. If the predetermined conditions are met, thefirst error correction rules pass the assessment. If no, the first errorcorrection rules are dropped

In addition, the model establishment module 303 is configured to replacethe character string to be processed in the training text collection andits effective context information with the correct character string andeffective context information to obtain the second replacing resultaccording to the second error correction rules. The model establishmentmodule 303 is further configured to judge whether the assessment resultof the second replacing result meets predetermined conditions. If yes,the second error correction rules pass assessment. If not, the seconderror correction rules are dropped. The established error correctionmodel includes the mentioned passed error correction rules.

The first text collection, the second text collection and the mentionedthird text collection are the same one. Alternatively, the accuracy ofthe text in the first text collection is higher than the accuracy of thetext in the second text collection and the third text collection. Thesecond text collection and the mentioned third text collection can bethe same text collection or different text collections.

Based on the aforementioned methods of training error correction modelprovided by the present application, the present application alsoprovides a kind of text error correction method, in the text errorcorrection method, according to the error correction rules stored in theerror correction model, search character strings from the text to beprocessed, conduct error correction processing for the searchedcharacter strings according to the error correction rules.

The method to conduct text error correction based on the errorcorrection model provided by the present application can also refer toFIG. 4 specifically.

FIG. 4 is a flowchart of a text-processing method in accordance withsome embodiments.

As shown in FIG. 4, this flowchart diagram includes:

Step 401, the text-processing program searches for the character stringto be processed from text to be processed based on the first errorcorrection rules stored in the error correction model, and search forcharacter strings to be processed and its effective context informationfrom the text to be processed based on the second error correction rulesstored in the error correction model.

Step 402, the text-processing program replaces a character string to beprocessed with the correct character string based on the first errorcorrection rules, and based on the second error correction rules,replace the character string to be processed and its effective contextinformation with correct character string and the mentioned effectivecontext information whose similarity to the character string to beprocessed meet predetermined requirements and provided with thementioned effective context information.

The first error correction rules include replacing the character stringsto be processed that their similarity satisfies the predeterminedrequirements with correct character strings. The second error correctionrules include replacing the character string to be processed and itseffective context information with the correct character string and thementioned effective context information that the similarity with thecharacter strings to be process satisfies the predetermined requirementsand have the mentioned effective context information. The effectivecontext information is the context information of the correct characterstrings in the training text collection, the common effective contextinformation of the character strings to be processed and the correctcharacter strings in the mentioned training text collection that theirsimilarity satisfies the predetermined requirements. The mentionedtraining text collection is the text collection configured to train theerror correction model.

The device to conduct text error correction based on the errorcorrection model provided by the present application can include errorcorrection model module and error correction processing module.

The error correction model module is configured to store errorcorrection rules. The error correction model is obtained by trainingthrough the following steps: searching the context information ofcorrect character strings in the training text collection, using thementioned context information as the effective context information tostore all correct character strings corresponding to each effectivecontext information; searching the character strings to be processed inthe training text collection that the similarity with the correctcharacter strings satisfies the predetermined requirements and have thementioned effective context information; generating error correctionrules according to the character strings to be processed, the correctcharacter strings whose similarity to character strings to be processedsatisfy the predetermined requirement and shared effective contextinformation of character strings to be processed and correct characterstrings, and establishing error correction model according to the testresult of error correction rules.

The error correction processing module is configured to search characterstrings from the text to be processed according to the error correctionrules stored in the error correction model, conduct error correctionprocessing for the searched character strings according to the errorcorrection rules.

The specific structure of the device to conduct text error correctionbased on the error correction model provided by the present applicationcan also refer to FIG. 5.

FIG. 5 is a schematic structural diagram of a text-processing device inaccordance with some embodiments.

As shown in FIG. 5, the text error correction device includes errorcorrection model module 501, search module 502 and replacing module 503.

Error correction model module 501 is configured to store errorcorrection rules, the error correction rules include the first errorcorrection rules that replace the character strings to be processed thattheir similarity satisfies the predetermined requirements with correctcharacter strings, and/or the second error correction rules that replacethe character string to be processed and its effective contextinformation with the correct character string and the mentionedeffective context information that the similarity with the characterstrings to be process satisfies the predetermined requirements and havethe mentioned effective context information. The mentioned effectivecontext information is the context information of the correct characterstrings in the training text collection, the common effective contextinformation of the character strings to be processed and the correctcharacter strings in the mentioned training text collection that theirsimilarity satisfies the predetermined requirements. The mentionedtraining text collection is the text collection configured to train theerror correction model.

Search module 502 is configured to search for the character string to beprocessed from text to be processed based on the first error correctionrules, and search for character string to be processed and its effectivecontext information from text to be processed based on the second errorcorrection rules.

Replacing module 503 is configured to replace the character string to beprocessed with the correct character string based on the first errorcorrection rules, and based on the second error correction rules,replace the character string to be processed and its effective contextinformation with correct character string and the mentioned effectivecontext information whose similarity to the character string to beprocessed meet predetermined requirements and provided with thementioned effective context information.

As described in FIGS. 1-5, if establishing error correction model inadvance based on context information of the character string andsimilarity among character strings, during practical error correctionprocess of text to be processed, when conducting error correctionprocessing directly based on error correction rules in error correctionmodel, as allowing to conduct searching and matching of contextinformation of the character string as well as judgment of similarityamong character strings, the evaluation of error correction rules andother tasks during establishing error correction model, the actual errorcorrection speed of text to be processed will be thus greatlyaccelerated.

The present application enables to recognize error character strings andreplace an error character string with a corresponding correct characterstring based on context information of the character string andsimilarity among character strings during the practical error correctionprocess of text to be processed, refer to FIG. 6-FIG. 7 for specificinformation.

FIG. 6 is a flowchart of a text-processing method in accordance withsome embodiments.

As shown in FIG. 6, this flowchart diagram includes:

Step 601, taking context information of a correct character string aseffective context information in advance, store all of correct characterstrings corresponding to each effective context information.

The correct character strings generally include predetermined words indictionary, and the mentioned effective context information is contextinformation of a correct character string in predetermined training textcollection.

Step 602, searching for a character string to be processed having thementioned effective context information in text to be processed, judgingwhether the similarity between the character string to be processed andthe correct character string having the same effective contextinformation as the character string to be processed meets predeterminedrequirements.

In this step, the text-processing program, according to pronunciationdictionary, judges whether similarity between the pronunciation of thecharacter string to be processed and the pronunciation of the correctcharacter string meet predetermined requirements. Alternatively, thetext-processing program, according to glyph dictionary, judges whethersimilarity between the glyph of the character string to be processed andthe glyph of the correct character string meet predeterminedrequirements.

Step 603, when the mentioned similarity meets predeterminedrequirements, replace the character string to be processed with thecorrect character string, or replace both the character string to beprocessed and the mentioned effective context information with thecorrect character string and the mentioned effective contextinformation.

In this step, when the similarity meets predetermined requirements, thetext-processing program replaces the character string to be processedwith the correct character string for obtaining the first replacingresult. When assessment result of the first replacing result meetspredetermined requirements, the text-processing program determines thefirst replacing result as the final error correction result. When theassessment result of the first replacing result fails to meetpredetermined requirements, the text-processing program replaces boththe character string to be processed and the mentioned effective contextinformation with the correct character string and the effective contextinformation for obtaining the second replacing result. When theassessment result of the second replacing result meets predeterminedrequirements, the text-processing program determines the secondreplacing result as the final error result. When the assessment resultof the second replacing result fails to meet predetermined requirements,the text-processing program keeps the character string to be processedinvariable or conducting other error correction processing.

FIG. 7 is a schematic structural diagram of a text-processing device inaccordance with some embodiments.

As shown in FIG. 7, this device includes storage module 701, similarstring search module 702 and error correction module 703.

Storage module 701 is configured to take context information of acorrect character string as effective context information in advance,store all of correct character strings corresponding to each effectivecontext information.

Similar string search module 702 is configured to search for a characterstring to be processed having the mentioned effective contextinformation from the text to be processed, judge whether similaritybetween the character string to be processed and the correct characterstring having the same effective context information as the characterstring to be processed meet predetermined requirements.

Error correction module 703 is configured to replace the characterstring to be processed with the correct character string when thementioned similarity meets predetermined requirements, or replace boththe character string to be processed and the mentioned effective contextinformation with the correct character string and the mentionedeffective context information.

Similar string search module 702 is configured, according topronunciation dictionary, to judge whether similarity between thepronunciation of the character string to be processed and thepronunciation of the correct character string having the same effectivecontext information as the character string to be processed meetpredetermined requirements, or according to glyph dictionary, to judgewhether similarity between the glyph of the character string to beprocessed and the glyph of the correct character string having the sameeffective context information as the character string to be processedmeet predetermined requirements.

Error correction module 703 is configured to replace the characterstring to be processed with the correct character string for obtainingthe first replacing result when the mentioned similarity meetspredetermined requirements. The error correction module 703 is furtherconfigured to determine the first replacing result as the final errorcorrection result when assessment result of the first replacing resultmeets predetermined requirements. The error correction module 703 isfurther configured to, when the assessment result of the first replacingresult fails to meet predetermined requirements, replace both thecharacter string to be processed and the mentioned effective contextinformation with the correct character string and the mentionedeffective context information for obtaining the second replacing result.The error correction module 703 is further configured to, when theassessment result of the second replacing result meets predeterminedrequirements, determine the second replacing result as the final errorresult.

FIG. 8 is a flowchart of a text-processing method in accordance withsome embodiments. The method is performed at a device (e.g., device 900as shown in FIG. 9) having one or more processors and memory storingprograms executed by the one or more processors. In some embodiments,this text-processing method is performed by an independent programprocessing given text. In accordance with some other embodiments, thistext-processing method works as a module in, or in combination with,another text-process program or text-input program. Text-input programsinclude any program that receives text as input, e.g., an onlinechatting program.

In step 801, a text-processing program selects a target word in a targetsentence by first predefined criteria. The target word and/or targetsentence can be selected by the user and the first predefined criteriaacknowledge user selection. In accordance with some embodiments, thetext-processing program also selects a target word because the targetword is deemed to be possibly wrong.

In Chinese text, a word consists of one or more Chinese characters and arecognition of a word is needed to determine whether a character stringis a word. The first predefined criteria include recognizing a wordhaving a few Chinese characters based at least on Chinese grammar. Thefirst predefined criteria include a word recognition algorithm (as WordRecognition Algorithm 940 shown in FIG. 9) to recognize a combination ofmore than one character as one word.

In addition, not all words in a sentence need further processing by atext-processing method. The recognition algorithm selects a few wordsfrom a sentence for further processing in order to increase efficiency.The selected words are deemed to be more likely to be wrong than othersin the target sentence. The selection is based on language rulesincluding grammar.

In step 802, the text-processing program acquires from the targetsentence a first sequence of words that precede the target word and asecond sequence of words that succeed the target word.

One way of acquiring the first and second sequences of words is toacquire, from the target sentence, all words before the target word asthe first sequence of words and all words behind the target word as thesecond sequence of words.

Another way of acquiring the first and second sequences of words is toacquire fixed lengths of words before and after the target word. Thelengths can be measured by number of characters, symbols, letters, etc.What is the optimal length is an empirical question that requiresrepetitive testing and may be circumstance-contingent. Theoretically,long fixed lengths of words are associated with more comprehensivereflection of the role of the target word in the target sentence butalso, as shown in subsequent steps, more time-consuming searching and asmaller sentence pool. In addition, the further away a word is locatedin the target sentence from the target word, the less value it has inthe process. Therefore, a person skilled in the art can recognize abalance can be achieved through repetitive testing of different lengths.

Yet another and more complex way of acquiring the first and secondsequences of words is to determine the lengths of the first and secondsequences of words based on the meaning of the target word and the wordsbefore or after the target words. Based on the meaning of the targetword, the program roughly determines that the meaning of the wordsbeyond the lengths have no relationship with the meaning of the targetword and exclude words beyond the lengths from the first and secondsequences of words.

In step 803, the text-processing program, from a sentence database,searches and acquires a group of words, each of which separates thefirst sequence of words from the second sequence of words in a sentence.The program searches for sentences containing the first sequence ofwords, one word and the second sequence of words, in that order. Thesearch is conducted in a sentence database that comprises millions orbillions of sentences. The search result provides all sentences with thefirst and second sequences of words and a word separating the twosequences of words in the correct order. The text-processing programthen acquires a group of words, each of which separates the firstsequence of words from the second sequence of words in a sentence.

In step 804, the text-processing program, from the group of words,selects candidate words whose similarity to the target word is above apre-set threshold according to second predefined criteria. In accordancewith some embodiments, the second predefined criteria include the lengthof words, the pronunciations of words, the meaning of words, the ease ofconfusion between one word and the target word, etc.

In step 805, the text-processing program creates a candidate sentencefor each of the candidate words by replacing the target word in thetarget sentence with each of the candidate words. Replacing the targetword with candidate word creates a new candidate sentence so that theevaluation of the candidate word is conducted on a sentence level.

In step 806, the text-processing program determines the fittest sentenceamong the candidate sentences according to a linguistic model (e.g., thelinguistic model 946 in FIG. 9). In accordance with some embodiments,the text-processing program also compares the fittest sentence with thetarget sentence according to the linguistic model.

In accordance with some embodiments, the linguistic model includescriteria for grammar and other language rules, the meaning of thecandidate sentence, the frequency of every candidate sentence appearingin the sentence database, etc. In accordance with some embodiments, thecandidate sentences are first evaluated based on whether they fit intorules of language. Some candidate sentences are eliminated because thecandidate words, while exist in some sentences containing the first andsecond sequences of words in the sentence database, break grammar orother language rules in the target sentence. In the next step, themeaning of the candidate sentences is evaluated. If there are othersentences in the text, the model evaluates whether meanings of thecandidate sentence is compatible with others. Lastly, for the remainingsentences, the model searches the frequencies of the remaining sentencesappearing in the sentence database. A higher frequency of a candidatesentence indicates a higher possibility that the candidate sentence isthe sentence that the writer of the target sentence intends to write.

In step 807, the text-processing program suggests the candidate wordwithin the fittest sentence as a correction. In accordance with someembodiments, after suggesting the candidate word within the fittestsentence, the device replaces the target word in the target sentencewith the suggested candidate word. Alternatively, the suggested word isshown to the user of the text-processing program as a choice.

FIG. 9 is a diagram of an example implementation of a text-processingdevice in accordance with some embodiments. While certain specificfeatures are illustrated, those skilled in the art will appreciate fromthe present disclosure that various other features have not beenillustrated for the sake of brevity and so as not to obscure morepertinent aspects of the implementations disclosed herein. To that end,the server computer 900 includes one or more processing units (CPU's)902, one or more network or other communications interfaces 908, adisplay 901, memory 905, and one or more communication buses 904 forinterconnecting these and various other components. The communicationbuses may include circuitry (sometimes called a chipset) thatinterconnects and controls communications between system components. Thememory 905 includes high-speed random access memory, such as DRAM, SRAM,DDR RAM or other random access solid state memory devices; and mayinclude non-volatile memory, such as one or more magnetic disk storagedevices, optical disk storage devices, flash memory devices, or othernon-volatile solid state storage devices. The memory 905 may optionallyinclude one or more storage devices remotely located from the CPU(s)902. The memory 905, including the non-volatile and volatile memorydevice(s) within the memory 905, comprises a non-transitory computerreadable storage medium.

In some implementations, the memory 905 or the non-transitory computerreadable storage medium of the memory 905 stores the following programs,modules and data structures, or a subset thereof including an operatingsystem 915, a network communication module 918, a user interface module920, and a text-processing program 930.

The operating system 915 includes procedures for handling various basicsystem services and for performing hardware dependent tasks.

The network communication module 918 facilitates communication withother devices via the one or more communication network interfaces 908(wired or wireless) and one or more communication networks, such as theinternet, other wide area networks, local area networks, metropolitanarea networks, and so on.

The user interface module 920 is configured to receive user inputsthrough the user interface 906.

The text-processing program 930 is configured to correct errors in atext, either independently or in combination with other text processingand/or text inputting program. The text-processing program 930 comprisesa selection module 932, a searching module 934, a word comparison module936 and a sentence comparison module 938.

The selection module 932 is configured to select a target word in atarget sentence by first predefined criteria. The selection module 932comprises a word recognition algorithm 940, which is configured torecognize a character string as a word having a few Chinese charactersbased at least on Chinese grammar. In addition, the selection module 932is configured to determine whether any words in a target sentence hassignificant enough possibility of being wrong.

The searching module 934 is configured to search and acquire a group ofwords from a sentence database 942, each of which separates the firstsequence of words from the second sequence of words in a sentence. Thesearching and acquiring process is illustrated in step 803 of FIG. 8 anddetails are not to be repeated here.

In accordance with some embodiments, the sentence database comprisestext that is acquired from articles and dictionaries. The sentencedatabase is updated periodically by acquiring sentences from internetsources. Periodic updating not only supplies more sentences but alsohelps to catch the ever-evolving language patterns and rules.

The word comparison module 936 is configured to select candidate wordsfrom the group of words. The similarity between a selected candidateword and the target word must be above a pre-set threshold according tosecond predefined criteria. The word comparison module 936 comprisesword comparison algorithm 944, which is configured to carry out thesecond predefined criteria.

The sentence comparison module 938 is configured to determine thefittest sentence among the candidate sentences. The determination isbased on a linguistic model 946. The linguistic model can comprisesmultiple sets of criteria as illustrated in step 806 of FIG. 8 andcombines any set of criteria depending on the circumstances.

While particular embodiments are described above, it will be understoodit is not intended to limit the invention to these particularembodiments. On the contrary, the invention includes alternatives,modifications and equivalents that are within the spirit and scope ofthe appended claims. Numerous specific details are set forth in order toprovide a thorough understanding of the subject matter presented herein.But it will be apparent to one of ordinary skill in the art that thesubject matter may be practiced without these specific details. In otherinstances, well-known methods, procedures, components, and circuits havenot been described in detail so as not to unnecessarily obscure aspectsof the embodiments.

The terminology used in the description of the invention herein is forthe purpose of describing particular embodiments only and is notintended to be limiting of the invention. As used in the description ofthe invention and the appended claims, the singular forms “a,” “an,” and“the” are intended to include the plural forms as well, unless thecontext clearly indicates otherwise. It will also be understood that theterm “and/or” as used herein refers to and encompasses any and allpossible combinations of one or more of the associated listed items. Itwill be further understood that the terms “includes,” “including,”“comprises,” and/or “comprising,” when used in this specification,specify the presence of stated features, operations, elements, and/orcomponents, but do not preclude the presence or addition of one or moreother features, operations, elements, components, and/or groups thereof.

As used herein, the term “if” may be construed to mean “when” or “upon”or “in response to determining” or “in accordance with a determination”or “in response to detecting,” that a stated condition precedent istrue, depending on the context. Similarly, the phrase “if it isdetermined [that a stated condition precedent is true]” or “if [a statedcondition precedent is true]” or “when [a stated condition precedent istrue]” may be construed to mean “upon determining” or “in response todetermining” or “in accordance with a determination” or “upon detecting”or “in response to detecting” that the stated condition precedent istrue, depending on the context.

Although some of the various drawings illustrate a number of logicalstages in a particular order, stages that are not order dependent may bereordered and other stages may be combined or broken out. While somereordering or other groupings are specifically mentioned, others will beobvious to those of ordinary skill in the art and so do not present anexhaustive list of alternatives. Moreover, it should be recognized thatthe stages could be implemented in hardware, firmware, software or anycombination thereof.

The foregoing description, for purpose of explanation, has beendescribed with reference to specific embodiments. However, theillustrative discussions above are not intended to be exhaustive or tolimit the invention to the precise forms disclosed. Many modificationsand variations are possible in view of the above teachings. Theembodiments were chosen and described in order to best explain theprinciples of the invention and its practical applications, to therebyenable others skilled in the art to best utilize the invention andvarious embodiments with various modifications as are suited to theparticular use contemplated.

What is claimed is:
 1. A computer-implemented method, comprising: at adevice having one or more processors and memory storing programsexecuted by the one or more processors: selecting a target word in atarget sentence by first predefined criteria; from the target sentence,acquiring a first sequence of words that precede the target word and asecond sequence of words that succeed the target word; from a sentencedatabase, searching and acquiring a group of words, each of whichseparates the first sequence of words from the second sequence of wordsin a sentence; from the group of words, selecting candidate words whosesimilarity to the target word is above a pre-set threshold according tosecond predefined criteria; creating a candidate sentence for each ofthe candidate words by replacing the target word in the target sentencewith each of the candidate words; determining the fittest sentence amongthe candidate sentences according to a linguistic model; and suggestingthe candidate word within the fittest sentence as a correction.
 2. Themethod of claim 1, further comprising: after suggesting the candidateword within the fittest sentence, replacing the target word in thetarget sentence with the suggested candidate word.
 3. The method ofclaim 1, wherein the first predefined criteria include whether acharacter string is a word based at least on Chinese grammar.
 4. Themethod of claim 1, wherein acquiring the first sequence of wordscomprises determining length of the first sequence of words based atleast on meaning of the target word.
 5. The method of claim 1, whereinacquiring the second sequence of words comprises determining length ofthe second sequence of words based at least on meaning of the targetword.
 6. The method of claim 1, wherein the length of the first sequenceof words is pre-set.
 7. The method of claim 1, wherein the linguisticmodel includes criteria for grammar.
 8. The method of claim 1, whereinthe linguistic model includes criteria for meaning of every candidatesentence.
 9. The method of claim 1, wherein at least one candidate wordwhose similarity to the target word is determined based on thepronunciation of the candidate word.
 10. The method of claim 1, whereinthe sentence database is updated periodically by acquiring sentencesfrom internet sources.
 11. A text-processing device, comprising: one ormore processors; memory; and one or more program modules stored in thememory and configured for execution by the one or more processors, theone or more program modules including instructions for: selecting atarget word in a target sentence by first predefined criteria; from thetarget sentence, acquiring a first sequence of words that precede thetarget word and a second sequence of words that succeed the target word;from a sentence database, searching and acquiring a group of words, eachof which separates the first sequence of words from the second sequenceof words in a sentence; from the group of words, selecting candidatewords whose similarity to the target word is above a pre-set thresholdaccording to second predefined criteria; creating a candidate sentencefor each of the candidate words by replacing the target word in thetarget sentence with each of the candidate words; determining thefittest sentence among the candidate sentences according to a linguisticmodel; and suggesting the candidate word within the fittest sentence asa correction.
 12. The text-processing device of claim 11, furthercomprising: after suggesting the candidate word within the fittestsentence, replacing the target word in the target sentence with thesuggested candidate word.
 13. The text-processing device of claim 11,wherein the first predefined criteria include whether a character stringis a word based at least on Chinese grammar.
 14. The text-processingdevice of claim 11, wherein acquiring the first sequence of wordscomprises determining length of the first sequence of words based atleast on meaning of the target word.
 15. The text-processing device ofclaim 11, wherein the length of the first sequence of words is pre-set.16. The text-processing device of claim 11, wherein the linguistic modelincludes criteria for grammar.
 17. The text-processing device of claim11, wherein the linguistic model includes criteria for meaning of everycandidate sentence.
 18. The text-processing device of claim 11, whereinat least one candidate word whose similarity to the target word isdetermined based on the pronunciation of the candidate word.
 19. Thetext-processing device of claim 11, wherein the sentence database isupdated periodically by acquiring sentences from internet sources.
 20. Anon-transitory computer readable storage medium, storing one or moreprograms for execution by one or more processors of a computer system,the one or more programs including instructions for: selecting a targetword in a target sentence by first predefined criteria; from the targetsentence, acquiring a first sequence of words that precede the targetword and a second sequence of words that succeed the target word; from asentence database, searching and acquiring a group of words, each ofwhich separates the first sequence of words from the second sequence ofwords in a sentence; from the group of words, selecting candidate wordswhose similarity to the target word is above a pre-set thresholdaccording to second predefined criteria; creating a candidate sentencefor each of the candidate words by replacing the target word in thetarget sentence with each of the candidate words; determining thefittest sentence among the candidate sentences according to a linguisticmodel; and suggesting the candidate word within the fittest sentence asa correction.